We define a family of $C^1$ functions which we call "nowhere coexpanding
functions" that is closed under composition and includes all $C^3$ functions
with non-positive Schwarzian derivative. We establish results on the number and
nature of the fixed points of these functions, including a generalisation of a
classic result of Singer.
( 2
min )
Feature generation aims to generate new and meaningful features to create a
discriminative representation space.A generated feature is meaningful when the
generated feature is from a feature pair with inherent feature interaction. In
the real world, experienced data scientists can identify potentially useful
feature-feature interactions, and generate meaningful dimensions from an
exponentially large search space, in an optimal crossing form over an optimal
generation path. But, machines have limited human-like abilities.We generalize
such learning tasks as self-optimizing feature generation. Self-optimizing
feature generation imposes several under-addressed challenges on existing
systems: meaningful, robust, and efficient generation. To tackle these
challenges, we propose a principled and generic representation-crossing
framework to solve self-optimizing feature generation.To achieve hashing
representation, we propose a three-step approach: feature discretization,
feature hashing, and descriptive summarization. To achieve reinforcement
crossing, we develop a hierarchical reinforcement feature crossing approach.We
present extensive experimental results to demonstrate the effectiveness and
efficiency of the proposed method. The code is available at
https://github.com/yingwangyang/HRC_feature_cross.git.
( 2
min )
Effectively leveraging multimodal information from social media posts is
essential to various downstream tasks such as sentiment analysis, sarcasm
detection and hate speech classification. However, combining text and image
information is challenging because of the idiosyncratic cross-modal semantics
with hidden or complementary information present in matching image-text pairs.
In this work, we aim to directly model this by proposing the use of two
auxiliary losses jointly with the main task when fine-tuning any pre-trained
multimodal model. Image-Text Contrastive (ITC) brings image-text
representations of a post closer together and separates them from different
posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates
the understanding of semantic correspondence between images and text by
penalizing unrelated pairs. We combine these objectives with five multimodal
models, demonstrating consistent improvements across four popular social media
datasets. Furthermore, through detailed analysis, we shed light on the specific
scenarios and cases where each auxiliary task proves to be most effective.
( 2
min )
Reasoning, as an essential ability for complex problem-solving, can provide
back-end support for various real-world applications, such as medical
diagnosis, negotiation, etc. This paper provides a comprehensive survey of
cutting-edge research on reasoning with language model prompting. We introduce
research works with comparisons and summaries and provide systematic resources
to help beginners. We also discuss the potential reasons for emerging such
reasoning abilities and highlight future research directions. Resources are
available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated
periodically).
( 2
min )
In this work, we provide a characterization of the feature-learning process
in two-layer ReLU networks trained by gradient descent on the logistic loss
following random initialization. We consider data with binary labels that are
generated by an XOR-like function of the input features. We permit a constant
fraction of the training labels to be corrupted by an adversary. We show that,
although linear classifiers are no better than random guessing for the
distribution we consider, two-layer ReLU networks trained by gradient descent
achieve generalization error close to the label noise rate. We develop a novel
proof technique that shows that at initialization, the vast majority of neurons
function as random features that are only weakly correlated with useful
features, and the gradient descent dynamics 'amplify' these weak, random
features to strong, useful features.
( 2
min )
The primary goal of this research is to propose a novel architecture for a
deep neural network that can solve fractional differential equations
accurately. A Gaussian integration rule and a $L_1$ discretization technique
are used in the proposed design. In each equation, a deep neural network is
used to approximate the unknown function. Three forms of fractional
differential equations have been examined to highlight the method's
versatility: a fractional ordinary differential equation, a fractional order
integrodifferential equation, and a fractional order partial differential
equation. The results show that the proposed architecture solves different
forms of fractional differential equations with excellent precision.
( 2
min )
We present a novel local-global feature fusion framework for body-weight
exercise recognition with floor-based dynamic pressure maps. One step further
from the existing studies using deep neural networks mainly focusing on global
feature extraction, the proposed framework aims to combine local and global
features using image processing techniques and the YOLO object detection to
localize pressure profiles from different body parts and consider physical
constraints. The proposed local feature extraction method generates two sets of
high-level local features consisting of cropped pressure mapping and numerical
features such as angular orientation, location on the mat, and pressure area.
In addition, we adopt a knowledge distillation for regularization to preserve
the knowledge of the global feature extraction and improve the performance of
the exercise recognition. Our experimental results demonstrate a notable 11
percent improvement in F1 score for exercise recognition while preserving
label-specific features.
( 2
min )
In the presence of right-censored data with covariates, the conditional
Kaplan-Meier estimator (also known as the Beran estimator) consistently
estimates the conditional survival function of the random follow-up for the
event of interest. However, a necessary condition is the unambiguous knowledge
of whether each individual is censored or not, which may be incomplete in
practice. We therefore propose a study of the Beran estimator when the
censoring indicators are generic random variables and discuss necessary
conditions for the efficiency of the Beran estimator. From this, we provide a
new estimator for the conditional survival function with missing not at random
(MNAR) censoring indicators based on a conditional copula model for the
missingness mechanism. In addition to the theoretical results, we illustrate
how the estimators work for small samples through a simulation study and show
their practical applicability by analyzing synthetic and real data.
( 2
min )
The task of preserving privacy while ensuring efficient communication is a
fundamental challenge in federated learning. In this work, we tackle this
challenge in the trusted aggregator model, and propose a solution that achieves
both objectives simultaneously. We show that employing a quantization scheme
based on subtractive dithering at the clients can effectively replicate the
normal noise addition process at the aggregator. This implies that we can
guarantee the same level of differential privacy against other clients while
substantially reducing the amount of communication required, as opposed to
transmitting full precision gradients and using central noise addition. We also
experimentally demonstrate that the accuracy of our proposed approach matches
that of the full precision gradient method.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
Markov processes are widely used mathematical models for describing dynamic
systems in various fields. However, accurately simulating large-scale systems
at long time scales is computationally expensive due to the short time steps
required for accurate integration. In this paper, we introduce an inference
process that maps complex systems into a simplified representational space and
models large jumps in time. To achieve this, we propose Time-lagged Information
Bottleneck (T-IB), a principled objective rooted in information theory, which
aims to capture relevant temporal features while discarding high-frequency
information to simplify the simulation task and minimize the inference error.
Our experiments demonstrate that T-IB learns information-optimal
representations for accurately modeling the statistical properties and dynamics
of the original process at a selected time lag, outperforming existing
time-lagged dimensionality reduction methods.
( 2
min )
The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and
challenging task that is important in many practical applications. Classical
model-based approaches to this problem require an accurate model to capture how
robot motions affect the deformation of the DLO. Nowadays, data-driven models
offer the best tradeoff between quality and computation time. This paper
analyzes several learning-based 3D models of the DLO and proposes a new one
based on the Transformer architecture that achieves superior accuracy, even on
the DLOs of different lengths, thanks to the proposed scaling method. Moreover,
we introduce a data augmentation technique, which improves the prediction
performance of almost all considered DLO data-driven models. Thanks to this
technique, even a simple Multilayer Perceptron (MLP) achieves close to
state-of-the-art performance while being significantly faster to evaluate. In
the experiments, we compare the performance of the learning-based 3D models of
the DLO on several challenging datasets quantitatively and demonstrate their
applicability in the task of shaping a DLO.
( 2
min )
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE)
architecture using a split vector quantizer for NTTS, as an enhancement to the
well-known Variational Autoencoder (VAE) and Vector Quantized Variational
Autoencoder (VQ-VAE) architectures. Compared to these previous architectures,
our proposed model retains the benefits of using an utterance-level bottleneck,
while keeping significant representation power and a discretized latent space
small enough for efficient prediction from text. We train the model on
recordings in the expressive task-oriented dialogues domain and show that
SVQ-VAE achieves a statistically significant improvement in naturalness over
the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent
acoustic space is predictable from text, reducing the gap between the standard
constant vector synthesis and vocoded recordings by 32%.
( 2
min )
Integrating variable renewable energy into the grid has posed challenges to
system operators in achieving optimal trade-offs among energy availability,
cost affordability, and pollution controllability. This paper proposes a
multi-agent reinforcement learning framework for managing energy transactions
in microgrids. The framework addresses the challenges above: it seeks to
optimize the usage of available resources by minimizing the carbon footprint
while benefiting all stakeholders. The proposed architecture consists of three
layers of agents, each pursuing different objectives. The first layer,
comprised of prosumers and consumers, minimizes the total energy cost. The
other two layers control the energy price to decrease the carbon impact while
balancing the consumption and production of both renewable and conventional
energy. This framework also takes into account fluctuations in energy demand
and supply.
( 2
min )
In the present paper we introduce new optimization algorithms for the task of
density ratio estimation. More precisely, we consider extending the well-known
KMM method using the construction of a suitable loss function, in order to
encompass more general situations involving the estimation of density ratio
with respect to subsets of the training data and test data, respectively. The
associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.
( 2
min )
In machine learning models, the estimation of errors is often complex due to
distribution bias, particularly in spatial data such as those found in
environmental studies. We introduce an approach based on the ideas of
importance sampling to obtain an unbiased estimate of the target error. By
taking into account difference between desirable error and available data, our
method reweights errors at each sample point and neutralizes the shift.
Importance sampling technique and kernel density estimation were used for
reweighteing. We validate the effectiveness of our approach using artificial
data that resemble real-world spatial datasets. Our findings demonstrate
advantages of the proposed approach for the estimation of the target error,
offering a solution to a distribution shift problem. Overall error of
predictions dropped from 7% to just 2% and it gets smaller for larger samples.
( 2
min )
Hurricanes present major challenges in the U.S. due to their devastating
impacts. Mitigating these risks is important, and the insurance industry is
central in this effort, using intricate statistical models for risk assessment.
However, these models often neglect key temporal and spatial hurricane patterns
and are limited by data scarcity. This study introduces a refined approach
combining the ARIMA model and K-MEANS to better capture hurricane trends, and
an Autoencoder for enhanced hurricane simulations. Our experiments show that
this hybrid methodology effectively simulate historical hurricane behaviors
while providing detailed projections of potential future trajectories and
intensities. Moreover, by leveraging a comprehensive yet selective dataset, our
simulations enrich the current understanding of hurricane patterns and offer
actionable insights for risk management strategies.
( 2
min )
Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph
structure and text-rich entity/relation information. Text-based KG embeddings
can represent entities by encoding descriptions with pre-trained language
models, but no open-sourced library is specifically designed for KGs with PLMs
at present. In this paper, we present LambdaKG, a library for KGE that equips
with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and
supports various tasks (e.g., knowledge graph completion, question answering,
recommendation, and knowledge probing). LambdaKG is publicly open-sourced at
https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at
this http URL and long-term maintenance.
( 2
min )
Open-ended learning benefits immensely from the use of symbolic methods for
goal representation as they offer ways to structure knowledge for efficient and
transferable learning. However, the existing Hierarchical Reinforcement
Learning (HRL) approaches relying on symbolic reasoning are often limited as
they require a manual goal representation. The challenge in autonomously
discovering a symbolic goal representation is that it must preserve critical
information, such as the environment dynamics. In this work, we propose a
developmental mechanism for subgoal discovery via an emergent representation
that abstracts (i.e., groups together) sets of environment states that have
similar roles in the task. We create a HRL algorithm that gradually learns this
representation along with the policies and evaluate it on navigation tasks to
show the learned representation is interpretable and results in data
efficiency.
( 2
min )
In the presence of heterogeneous data, where randomly rotated objects fall
into multiple underlying categories, it is challenging to simultaneously
classify them into clusters and synchronize them based on pairwise relations.
This gives rise to the joint problem of community detection and
synchronization. We propose a series of semidefinite relaxations, and prove
their exact recovery when extending the celebrated stochastic block model to
this new setting where both rotations and cluster identities are to be
determined. Numerical experiments demonstrate the efficacy of our proposed
algorithms and confirm our theoretical result which indicates a sharp phase
transition for exact recovery.
( 2
min )
We aim to provide a general framework of for computational photography that
recovers the real scene from imperfect images, via the Deep Nonparametric
Convexified Filtering (DNCF). It is consists of a nonparametric deep network to
resemble the physical equations behind the image formation, such as denoising,
super-resolution, inpainting, and flash. DNCF has no parameterization dependent
on training data, therefore has a strong generalization and robustness to
adversarial image manipulation. During inference, we also encourage the network
parameters to be nonnegative and create a bi-convex function on the input and
parameters, and this adapts to second-order optimization algorithms with
insufficient running time, having 10X acceleration over Deep Image Prior. With
these tools, we empirically verify its capability to defend image
classification deep networks against adversary attack algorithms in real-time.
( 2
min )
In the presence of right-censored data with covariates, the conditional
Kaplan-Meier estimator (also known as the Beran estimator) consistently
estimates the conditional survival function of the random follow-up for the
event of interest. However, a necessary condition is the unambiguous knowledge
of whether each individual is censored or not, which may be incomplete in
practice. We therefore propose a study of the Beran estimator when the
censoring indicators are generic random variables and discuss necessary
conditions for the efficiency of the Beran estimator. From this, we provide a
new estimator for the conditional survival function with missing not at random
(MNAR) censoring indicators based on a conditional copula model for the
missingness mechanism. In addition to the theoretical results, we illustrate
how the estimators work for small samples through a simulation study and show
their practical applicability by analyzing synthetic and real data.
( 2
min )
We consider the problem of approximating the regression function from noisy
vector-valued data by an online learning algorithm using an appropriate
reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm,
i.i.d. samples become available one by one by a random process and are
successively processed to build approximations to the regression function. We
are interested in the asymptotic performance of such online approximation
algorithms and show that the expected squared error in the RKHS norm can be
bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed
data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption
on the regression function and the constant $C$ depends on the variance of the
input noise, the smoothness of the regression function and further parameters
of the algorithm.
( 2
min )
Benign overfitting, the phenomenon where interpolating models generalize well
in the presence of noisy data, was first observed in neural network models
trained with gradient descent. To better understand this empirical observation,
we consider the generalization error of two-layer neural networks trained to
interpolation by gradient descent on the logistic loss following random
initialization. We assume the data comes from well-separated class-conditional
log-concave distributions and allow for a constant fraction of the training
labels to be corrupted by an adversary. We show that in this setting, neural
networks exhibit benign overfitting: they can be driven to zero training error,
perfectly fitting any noisy training labels, and simultaneously achieve minimax
optimal test error. In contrast to previous work on benign overfitting that
require linear or kernel-based predictors, our analysis holds in a setting
where both the model and learning dynamics are fundamentally nonlinear.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. […]
( 13
min )
With Style2Fab, makers can rapidly customize models of 3D-printable objects, such as assistive devices, without hampering their functionality.
( 10
min )
In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.
( 10
min )
Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific […]
( 8
min )
This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful. Unstructured data accounts for 80% of all the data found within organizations, consisting of […]
( 14
min )
Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services […]
The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.
( 10
min )
What’s the driving force behind AI’s recent, rapid progress? Research manager Ahmed Awadallah shares his insights on this, the two-stage approach to training large-scale models, and the need for better model evaluation in this episode of the #MSRPodcast.
The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.
( 31
min )
Working as a data scientist is the dream of many IT professionals these days. It is no secret that data science is a skyrocketing field attracting young professionals and inspiring many to switch careers to data science. On one front are young professionals who study their courses in colleges to pursue their dream of becoming… Read More »Are data science certifications the gateway to competitive pay?
The post Are data science certifications the gateway to competitive pay? appeared first on Data Science Central.
( 19
min )
CUPED: Improve Your A/B Testing - Detect Smaller Gains, Utilise Smaller Samples and Make Smarter Decisions!
The post CUPED for starters: Enhancing controlled experiments with pre-experiment data appeared first on Data Science Central.
( 26
min )
The best way to model business and consumer dynamics is collaboratively, with stakeholders all in the same virtual room contributing. Of course, this has been happening asynchronously for some time now, but the potential exists for more real-time interaction. Modelers don’t work in a vacuum, of course. The iterations between a modeler who develops a… Read More »Collaborative visual knowledge graph modeling at the system level
The post Collaborative visual knowledge graph modeling at the system level appeared first on Data Science Central.
( 20
min )
GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW. Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest. It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance Read article >
( 6
min )
The machine-learning method works on most mobile devices and could be expanded to assess other motor disorders outside of the doctor’s office.
( 10
min )
Although computer scientists may initially treat data bias and error as a nuisance, researchers argue it’s a hidden treasure trove for reflecting societal values.
( 10
min )
Researchers use synthetic data to improve a model’s ability to grasp conceptual information, which could enhance automatic captioning and question-answering systems.
( 10
min )
Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content. […]
( 8
min )
In this issue: Efficient polyglot analytics on semantic data aids query performance; generative retrieval for conversational question answering improves dialogue-based interfaces; a new tool uses ML to address capacity degradation in lithium-ion batteries.
The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.
( 9
min )
Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Read article >
( 5
min )
In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >
( 6
min )
Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >
( 7
min )
Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >
( 6
min )
Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows. When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all Read article >
( 7
min )
Businesses today constantly strive to gain a competitive edge in their marketing efforts. Leveraging their data effectively to create data-driven campaigns is the best way to trump the competition. One of the best tools at their disposal to utilize their data is a data warehouse. Data warehousing is crucial in enhancing marketing and campaign management… Read More »Data Warehousing: The key to effective marketing campaign management
The post Data Warehousing: The key to effective marketing campaign management appeared first on Data Science Central.
( 21
min )
The way we work has changed, with remote teams now a common part of the landscape. While remote work offers flexibility, it also brings challenges. Managing remote teams effectively is crucial to ensure productivity and collaboration. In this article, we’ll explore how using time tracking for remote teams can help manage employees’ performance better. Time-tracking… Read More »Data-driven insights: Improving remote team performance with time-tracking analytics
The post Data-driven insights: Improving remote team performance with time-tracking analytics appeared first on Data Science Central.
( 21
min )
In our increasingly interconnected world, the digital realm has become both a frontier of innovation and a battleground of threats. As technology advances, so do the tactics of malicious actors who seek to exploit vulnerabilities in our digital infrastructure. The rapid evolution of cyber threats calls for a paradigm shift in defense strategies, and that’s… Read More »AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies
The post AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies appeared first on Data Science Central.
( 22
min )
This research paper was presented at the 28th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming. Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. […]
The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs appeared first on Microsoft Research.
( 10
min )
Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes. SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, […]
( 6
min )
This post addresses the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment.
( 6
min )
In this post, we share how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services
( 10
min )
The system could improve image quality in video streaming or help autonomous vehicles identify road hazards in real-time.
( 10
min )
Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.
( 14
min )
Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving […]
( 10
min )
In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network’s edge. Separately, NVIDIA announced inference Read article >
( 7
min )
“Lightning” system connects photons to the electronic components of computers using a novel abstraction, creating the first photonic computing prototype to serve real-time machine-learning inference requests.
( 9
min )
There has been much recent progress in forecasting the next observation of a
linear dynamical system (LDS), which is known as the improper learning, as well
as in the estimation of its system matrices, which is known as the proper
learning of LDS. We present an approach to proper learning of LDS, which in
spite of the non-convexity of the problem, guarantees global convergence of
numerical solutions to a least-squares estimator. We present promising
computational results.
( 2
min )
Motivation: We explored how explainable AI (XAI) can help to shed light into
the inner workings of neural networks for protein function prediction, by
extending the widely used XAI method of integrated gradients such that latent
representations inside of transformer models, which were finetuned to Gene
Ontology term and Enzyme Commission number prediction, can be inspected too.
Results: The approach enabled us to identify amino acids in the sequences that
the transformers pay particular attention to, and to show that these relevant
sequence parts reflect expectations from biology and chemistry, both in the
embedding layer and inside of the model, where we identified transformer heads
with a statistically significant correspondence of attribution maps with ground
truth sequence annotations (e.g., transmembrane regions, active sites) across
many proteins. Availability and Implementation: Source code can be accessed at
https://github.com/markuswenzel/xai-proteins .
( 2
min )
We study the problem of estimating mixtures of Gaussians under the constraint
of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4
\log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a
mixture of $k$ Gaussians up to total variation distance $\alpha$ while
satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample
complexity upper bound for the problem that does not make any structural
assumptions on the GMMs.
To solve the problem, we devise a new framework which may be useful for other
tasks. On a high level, we show that if a class of distributions (such as
Gaussians) is (1) list decodable and (2) admits a "locally small'' cover
[BKSW19] with respect to total variation distance, then the class of its
mixtures is privately learnable. The proof circumvents a known barrier
indicating that, unlike Gaussians, GMMs do not admit a locally small cover
[AAL21].
( 2
min )
This paper presents a novel reconstruction method that leverages Diffusion
Models to protect machine learning classifiers against adversarial attacks, all
without requiring any modifications to the classifiers themselves. The
susceptibility of machine learning models to minor input perturbations renders
them vulnerable to adversarial attacks. While diffusion-based methods are
typically disregarded for adversarial defense due to their slow reverse
process, this paper demonstrates that our proposed method offers robustness
against adversarial threats while preserving clean accuracy, speed, and
plug-and-play compatibility. Code at:
https://github.com/HondamunigePrasannaSilva/DiffDefence.
( 2
min )
Multiscale stochastic dynamical systems have been widely adopted to
scientific and engineering problems due to their capability of depicting
complex phenomena in many real world applications. This work is devoted to
investigating the effective reduced dynamics for a slow-fast stochastic
dynamical system. Given observation data on a short-term period satisfying some
unknown slow-fast stochastic system, we propose a novel algorithm including a
neural network called Auto-SDE to learn invariant slow manifold. Our approach
captures the evolutionary nature of a series of time-dependent autoencoder
neural networks with the loss constructed from a discretized stochastic
differential equation. Our algorithm is also proved to be accurate, stable and
effective through numerical experiments under various evaluation metrics.
( 2
min )
In this work, we proposed a novel and general method to construct tight
frames on graphs with compact supports based on a series of hierarchical
partitions. Starting from our abstract construction that generalizes previous
methods based on partition trees, we are able to flexibly incorporate subgraph
Laplacians into our design of graph frames. Consequently, our general methods
permit adjusting the (subgraph) vanishing moments of the framelets and extra
properties, such as directionality, for efficiently representing graph signals
with path-like supports. Several variants are explicitly defined and tested.
Experimental results show our proposed graph frames perform superiorly in
non-linear approximation tasks.
( 2
min )
Multiagent systems aim to accomplish highly complex learning tasks through
decentralised consensus seeking dynamics and their use has garnered a great
deal of attention in the signal processing and computational intelligence
societies. This article examines the behaviour of multiagent networked systems
with nonlinear filtering/learning dynamics. To this end, a general formulation
for the actions of an agent in multiagent networked systems is presented and
conditions for achieving a cohesive learning behaviour is given. Importantly,
application of the so derived framework in distributed and federated learning
scenarios are presented.
( 2
min )
A cross-departmental team is leading efforts to utilize machine learning for increased efficiency in heating and cooling MIT’s buildings.
( 10
min )
The PhD student is honing algorithms for designing large structures with less material — helping to shrink the construction industry’s huge carbon footprint.
( 10
min )
The world’s largest democracy is poised to transform itself and the world, embracing AI on an enormous scale. Speaking with the press Friday in Bengaluru, in the context of announcements from two of India’s largest conglomerates, Reliance Industries Limited and Tata Group, NVIDIA founder and CEO Jensen Huang detailed plans to bring AI technology and Read article >
( 6
min )
In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents. Whether you're in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.
( 11
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS). In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem.
( 10
min )
Machine learning (ML) is transforming every industry, process, and business, but the path to success is not always straightforward. In this blog post, we demonstrate how Duke Energy, a Fortune 150 company headquartered in Charlotte, NC., collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use computer vision to automate the inspection of wooden utility poles and help prevent power outages, property damage and even injuries.
( 13
min )
Gender, race, and age disparities in AI-generated images persist. This AIES 2023 study on text-to-image models shows that even basic prompts can lead to underrepresentation, calling for responsible bias mitigation strategies.
The post Understanding social biases through the text-to-image generation lens appeared first on Microsoft Research.
( 10
min )
Every year, interns help advance research at Microsoft. In “Intern Insights,” PhD students Anunay Kulshrestha and Karan Newatia talk with cryptographer Josh Benaloh about working on the verifiable election technology ElectionGuard.
The post Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia appeared first on Microsoft Research.
( 30
min )
Thanks to rapid technological advances, consumers have become accustomed to an unprecedented level of convenience and efficiency. Smartphones make it easier than ever to search for a product and have it delivered right to the front door. Video chat technology lets friends and family on different continents connect with ease. With voice command tools, AI Read article >
( 12
min )
GeForce NOW brings expanded support for PC Game Pass to members this week. Members can stream eight more games from Microsoft’s subscription service, including four titles from hit publisher Focus Entertainment. Play A Plague Tale: Requiem, Atomic Heart and more from the GeForce NOW library at up to 4K resolution and 120 frames per second Read article >
( 5
min )
In this post, we will build an end-to-end solution to find optimal control policies using only historical data on Amazon SageMaker using Ray’s RLlib library. To learn more about reinforcement learning, see Use Reinforcement Learning with Amazon SageMaker.
( 10
min )
This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.
( 15
min )
In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.
( 11
min )
Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.” Recently, machine learning-based approaches have achieved […]
The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.
( 11
min )
In an increasingly interconnected world where digital transactions have become the norm the battle against fraud has taken on new dimensions. The challenge lies not only in identifying familiar fraud patterns but also in unearthing the intricate web of evolving deceptions that threaten industries such as finance, e-commerce, and insurance. As fraudsters continually adapt their… Read More »Fraud detection using Machine Learning: Unmasking deceptive patterns
The post Fraud detection using Machine Learning: Unmasking deceptive patterns appeared first on Data Science Central.
( 30
min )
New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI at Microsoft.
The post Frontiers of multimodal learning: A responsible AI approach appeared first on Microsoft Research.
( 25
min )
In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.
( 7
min )
This post shows you how to configure the Amazon Kendra AEM connector to index your content and search your AEM assets and pages. The connector also ingests the access control list (ACL) information for each document. The ACL information is used to show search results filtered by what a user has access to.
( 11
min )
Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases.
( 46
min )
Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.
( 12
min )
Before she entered high school, Ge Dong wanted to be a physicist like her mom, a professor at Shanghai Jiao Tong University.
( 6
min )
Rafi Nizam is an award-winning independent animator, director, character designer and more. He’s developed feature films at Sony Pictures, children’s series and comedies at BBC and global transmedia content at NBCUniversal.
( 8
min )
As generative AI evolves, certain trends are becoming clearer, In yet another milestone in AI consulting giant McKinsey unveiled its own generative AI tool for employees called lilli My comments a) McKinsey launching this agent gives credibility to the domain for enterprise AI assistants b) On one hand, it’s a familiar copilot strategy – but… Read More »Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for consumer is not – Part One
The post Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for consumer is not – Part One appeared first on Data Science Central.
( 19
min )
Programmers can no longer rely on the traditional method of targeting specific hardware accelerators with conditional pragmas (e.g., #ifdef) to match the software to the hardware at a particular datacenter or customer site. Humans writing machine-specific code cannot address the exponential increase in possible hardware combinations in the modern multivendor, multiarchitecture computing environment. Open software provides a multiarchitecture, multivendor solution that addresses the complexities of accelerated HPC and AI computing.
The post Addressing the challenge of software support for multiarchitecture AI accelerated HPC appeared first on Data Science Central.
( 25
min )
In part one of this blog, we saw how there is an increasing case for an enterprise chatbot use case. In part two, we ask the question Could a consumer chatbot i.e. directly customer facing chatbot be a flawed use case for an LLM? The consumer (customer facing) chatbot case is a familiar use case… Read More »Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for consumer is not – Part two
The post Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for consumer is not – Part two appeared first on Data Science Central.
( 19
min )
In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. We first highlight how we use AWS Glue for highly parallel data processing. We then discuss how Amazon SageMaker helps us with feature engineering and building a scalable supervised deep learning model.
( 10
min )
In this post, we target these situations and solve the problem of risking high costs by deploying large foundation models to Amazon SageMaker asynchronous endpoints from Amazon SageMaker JumpStart. This can help cut costs of the architecture, allowing the endpoint to run only when requests are in the queue and for a short time-to-live, while scaling down to zero when no requests are waiting to be serviced. This sounds great for a lot of use cases; however, an endpoint that has scaled down to zero will introduce a cold start time before being able to serve inferences.
( 10
min )
Microsoft researchers are proposing a new way to ensure greater trust and accountability in email, texts, direct messages on social platforms, even phone calls, to help mitigate sophisticated threats from AI-related scams and fraud.
The post Rethinking trust in direct messages in the AI era appeared first on Microsoft Research.
( 14
min )
With coral reefs in rapid decline across the globe, researchers from the University of Hawaii at Mānoa have pioneered an AI-based surveying tool that monitors reef health from the sky. Using deep learning models and high-resolution satellite imagery powered by NVIDIA GPUs, the researchers have developed a new method for spotting and tracking coral reef Read article >
( 6
min )
Creating 3D scans of physical products can be time consuming. Businesses often use traditional methods, like photogrammetry-based apps and scanners, but these can take hours or even days. They also don’t always provide the 3D quality and level of detail needed to make models look realistic in all its applications. Italy-based startup Covision Media is Read article >
( 7
min )
Underscoring NVIDIA’s growing relationship with the global technology superpower, Indian Prime Minister Narendra Modi met with NVIDIA founder and CEO Jensen Huang Monday evening. The meeting at 7 Lok Kalyan Marg — as the Prime Minister’s official residence in New Delhi is known — comes as Modi prepares to host a gathering of leaders from Read article >
( 5
min )
This blog post is not the end of my journey to integrate GenAI with my “Thinking Like a Data Scientist” (TLADS) methodology, but it is the last post on this leg of the journey. And the journey has been fascinating. I can’t wait to get this modified material in front of my students. In part… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part III
The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part III appeared first on Data Science Central.
( 24
min )
Knowledge graphs are powerful tools for representing and organising complex
biomedical data. Several knowledge graph embedding algorithms have been
proposed to learn from and complete knowledge graphs. However, a recent study
demonstrates the limited efficacy of these embedding algorithms when applied to
biomedical knowledge graphs, raising the question of whether knowledge graph
embeddings have limitations in biomedical settings. This study aims to apply
state-of-the-art knowledge graph embedding models in the context of a recent
biomedical knowledge graph, BioKG, and evaluate their performance and potential
downstream uses. We achieve a three-fold improvement in terms of performance
based on the HITS@10 score over previous work on the same biomedical knowledge
graph. Additionally, we provide interpretable predictions through a rule-based
method. We demonstrate that knowledge graph embedding models are applicable in
practice by evaluating the best-performing model on four tasks that represent
real-life polypharmacy situations. Results suggest that knowledge learnt from
large biomedical knowledge graphs can be transferred to such downstream use
cases. Our code is available at https://github.com/aryopg/biokge.
( 3
min )
In reinforcement learning (RL), key components of many algorithms are the
exploration strategy and replay buffer. These strategies regulate what
environment data is collected and trained on and have been extensively studied
in the RL literature. In this paper, we investigate the impact of these
components in the context of generalisation in multi-task RL. We investigate
the hypothesis that collecting and training on more diverse data from the
training environments will improve zero-shot generalisation to new tasks. We
motivate mathematically and show empirically that generalisation to tasks that
are "reachable'' during training is improved by increasing the diversity of
transitions in the replay buffer. Furthermore, we show empirically that this
same strategy also shows improvement for generalisation to similar but
"unreachable'' tasks which could be due to improved generalisation of the
learned latent representations.
( 2
min )
We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal
graph-based transformer model for detecting hate speech in online social
networks, such as Reddit discussions. In contrast to traditional comment-only
methods, our approach to labelling a comment as hate speech involves a holistic
analysis of text and images grounded in the discussion context. This is done by
leveraging graph transformers to capture the contextual relationships in the
entire discussion surrounding a comment and grounding the interwoven fusion
layers that combine individual comments' text and image embeddings instead of
processing modalities separately. We compare the performance of our model to
baselines that only process individual comments and conduct extensive ablation
studies. To evaluate our work, we present a new dataset, HatefulDiscussions,
comprising complete multi-modal discussions from multiple online communities on
Reddit. We conclude with future work for multimodal solutions to deliver social
value in online contexts, arguing that capturing a holistic view of a
conversation significantly advances the effort to detect anti-social behaviour.
( 2
min )
The advent of novel 5G services and applications with binding latency
requirements and guaranteed Quality of Service (QoS) hastened the need to
incorporate autonomous and proactive decision-making in network management
procedures. The objective of our study is to provide a thorough analysis of
predictive latency within 5G networks by utilizing real-world network data that
is accessible to mobile network operators (MNOs). In particular, (i) we present
an analytical formulation of the user-plane latency as a Hypoexponential
distribution, which is validated by means of a comparative analysis with
empirical measurements, and (ii) we conduct experimental results of
probabilistic regression, anomaly detection, and predictive forecasting
leveraging on emerging domains in Machine Learning (ML), such as Bayesian
Learning (BL) and Machine Learning on Graphs (GML). We test our predictive
framework using data gathered from scenarios of vehicular mobility, dense-urban
traffic, and social gathering events. Our results provide valuable insights
into the efficacy of predictive algorithms in practical applications.
( 2
min )
Pre-trained large language models demonstrate potential in extracting
information from DNA sequences, yet adapting to a variety of tasks and data
modalities remains a challenge. To address this, we propose DNAGPT, a
generalized DNA pre-training model trained on over 200 billion base pairs from
all mammals. By enhancing the classic GPT model with a binary classification
task (DNA sequence order), a numerical regression task (guanine-cytosine
content prediction), and a comprehensive token language, DNAGPT can handle
versatile DNA analysis tasks while processing both sequence and numerical data.
Our evaluation of genomic signal and region recognition, mRNA abundance
regression, and artificial genomes generation tasks demonstrates DNAGPT's
superior performance compared to existing models designed for specific
downstream tasks, benefiting from pre-training using the newly designed model
structure.
( 2
min )
We study the use of binary activated neural networks as interpretable and
explainable predictors in the context of regression tasks on tabular data; more
specifically, we provide guarantees on their expressiveness, present an
approach based on the efficient computation of SHAP values for quantifying the
relative importance of the features, hidden neurons and even weights. As the
model's simplicity is instrumental in achieving interpretability, we propose a
greedy algorithm for building compact binary activated networks. This approach
doesn't need to fix an architecture for the network in advance: it is built one
layer at a time, one neuron at a time, leading to predictors that aren't
needlessly complex for a given task.
( 2
min )
We propose to apply several gradient estimation techniques to enable the
differentiation of programs with discrete randomness in High Energy Physics.
Such programs are common in High Energy Physics due to the presence of
branching processes and clustering-based analysis. Thus differentiating such
programs can open the way for gradient based optimization in the context of
detector design optimization, simulator tuning, or data analysis and
reconstruction optimization. We discuss several possible gradient estimation
strategies, including the recent Stochastic AD method, and compare them in
simplified detector design experiments. In doing so we develop, to the best of
our knowledge, the first fully differentiable branching program.
( 2
min )
These lecture notes provide an overview of existing methodologies and recent
developments for estimation and inference with high dimensional time series
regression models. First, we present main limit theory results for high
dimensional dependent data which is relevant to covariance matrix structures as
well as to dependent time series sequences. Second, we present main aspects of
the asymptotic theory related to time series regression models with many
covariates. Third, we discuss various applications of statistical learning
methodologies for time series analysis purposes.
( 2
min )
Implicit neural networks have demonstrated remarkable success in various
tasks. However, there is a lack of theoretical analysis of the connections and
differences between implicit and explicit networks. In this paper, we study
high-dimensional implicit neural networks and provide the high dimensional
equivalents for the corresponding conjugate kernels and neural tangent kernels.
Built upon this, we establish the equivalence between implicit and explicit
networks in high dimensions.
( 2
min )
We’re excited to announce the availability of response streaming through Amazon SageMaker real-time inference. Now you can continuously stream inference responses back to the client when using SageMaker real-time inference to help you build interactive experiences for generative AI applications such as chatbots, virtual assistants, and music generators. With this new feature, you can start streaming the responses immediately when they’re available instead of waiting for the entire response to be generated. This lowers the time-to-first-byte for your generative AI applications. In this post, we’ll show how to build a streaming web application using SageMaker real-time endpoints with the new response streaming feature for an interactive chat use case. We use Streamlit for the sample demo application UI.
( 12
min )
Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps). Furthermore, we deep dive on the most common generative AI use case of text-to-text applications and LLM operations (LLMOps), a subset of FMOps. The following figure illustrates the topics we discuss.
( 23
min )
MIT Plasma Science and Fusion Center will receive DoE support to improve access to fusion data and increase workforce diversity.
( 8
min )
Entrepreneurs are cultivating generative AI from the west coast of Africa to the eastern edge of the Arabian Desert. Gen AI is the latest of the big plans Kofi Genfi and Nii Osae have been hatching since they met 15 years ago in high school in Accra, Ghana’s capital that sits on the Gulf of Read article >
( 7
min )
Academics Mory Gharib and Alireza Ramezani in 2020 were spitballing a transforming robot that is now getting a shot at work that’s literally out of this world: NASA Mars Rover missions. Caltech has unveiled its multi-talented robot that can fly, drive, walk and do eight permutations of motions through a combination of its skills. They Read article >
( 6
min )
Just like that, summer falls into September, and some of the most anticipated games of the year, like the Cyberpunk 2077: Phantom Liberty expansion, PAYDAY 3 and Party Animals, are dropping into the GeForce NOW library at launch this month. They’re part of 24 new games hitting the cloud gaming service in September. And the Read article >
( 8
min )
In this episode of the Microsoft Research Podcast, Managing Director of Microsoft Research India Sriram Rajamani discusses how generative AI is impacting the lab’s approach to research and how the country’s many languages can help advance conversational systems.
The post AI Frontiers: AI in India and beyond with Sriram Rajamani appeared first on Microsoft Research.
( 30
min )
Powered by Amazon Lex, the QnABot on AWS solution is an open-source, multi-channel, multi-language conversational chatbot. QnABot allows you to quickly deploy self-service conversational AI into your contact center, websites, and social media channels, reducing costs, shortening hold times, and improving customer experience and brand sentiment. In this post, we introduce the new Generative AI features for QnABot and walk through a tutorial to create, deploy, and customize QnABot to use these features. We also discuss some relevant use cases.
( 13
min )
This post demonstrates a strategy for fine-tuning publicly available LLMs for the task of radiology report summarization using AWS services. LLMs have demonstrated remarkable capabilities in natural language understanding and generation, serving as foundation models that can be adapted to various domains and tasks. There are significant benefits to using a pre-trained model. It reduces computation costs, reduces carbon footprints, and allows you to use state-of-the-art models without having to train one from scratch.
( 13
min )
In this issue: An illusion of predictability in scientific results; Kathleen Sullivan named to Insider’s 30 under 40 in healthcare list; FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations.
The post Research Focus: Week of August 28, 2023 appeared first on Microsoft Research.
( 9
min )
Each year, nearly 32 million people travel through the Bengaluru Airport, or BLR, one of the busiest airports in the world’s most populous nation. To provide such multitudes with a safer, quicker experience, the airport in the city formerly known as Bangalore is tapping vision AI technologies powered by Industry.AI. A member of the NVIDIA Read article >
( 6
min )
In the global entertainment landscape, TV show and film production stretches far beyond Hollywood or Bollywood — it’s a worldwide phenomenon. However, while streaming platforms have broadened the reach of content, dubbing and translation technology still has plenty of room for growth. Deepdub acts as a digital bridge, providing access to content by using generative Read article >
( 5
min )
In the dynamic landscape of modern business, the art of seamless data migration has evolved into a strategic imperative. As you navigate the intricacies of workspace transformations, you’re met with a complex interplay of technological advancements and operational demands Enter the era of leveraging Artificial Intelligence (AI) to redefine data migration – an approach that… Read More »Data migration redefined: Leveraging AI trends for smooth workspace transitions
The post Data migration redefined: Leveraging AI trends for smooth workspace transitions appeared first on Data Science Central.
( 21
min )
Currently, the use of technology in shipping and logistics is leading the industry through a transformative era, driven by rapid technological advancements, undoubtedly marking a pivotal moment in the digital shipping evolution. From automating routine processes to employing intelligent algorithms that predict and optimize routes, the technological revolution is redefining the way goods are transported… Read More »The future of shipping: How technology is shaping logistics and fulfillment
The post The future of shipping: How technology is shaping logistics and fulfillment appeared first on Data Science Central.
( 23
min )
In the early days of the Internet, there were four ‘horsemen’ of the Internet With IBM’s 4.5 billion investment in Hugging face today, the generative AI landscape is becoming a bit clearer. There are four Generative AI leaders emerging – others lagging – and one unknown Lets look at the four leaders of Generative AI… Read More »Generative AI megatrends: The four horsemen of Generative AI
The post Generative AI megatrends: The four horsemen of Generative AI appeared first on Data Science Central.
( 18
min )
There seems to be an app for everything, and mental health is no exception. According to a report, the global mental health apps market size was valued at $5.2 billion in 2022 and is predicted to reach $26.36 billion by 2032, at a CAGR of 17.7% during the forecast period. Mental health apps have emerged… Read More »The power of digital solutions: How mental health apps are transforming patient care
The post The power of digital solutions: How mental health apps are transforming patient care appeared first on Data Science Central.
( 20
min )
Introduction In our rapidly digitizing world, how businesses and systems communicate is paramount. The bedrock of this communication lies in data exchange methods, which allow seamless information flow, driving operational efficiencies and enabling innovation. Over the years, various data exchange protocols have emerged, each boasting unique strengths and presenting challenges. As enterprises strive to integrate… Read More »Modern data exchange methods: Exploring the strengths and limitations of leading protocols
The post Modern data exchange methods: Exploring the strengths and limitations of leading protocols appeared first on Data Science Central.
( 23
min )
Dramatic gains in hardware performance have spawned generative AI, and a rich pipeline of ideas for future speedups will drive machine learning to new heights, Bill Dally, NVIDIA’s chief scientist and senior vice president of research, said today in a keynote. Dally described a basket of techniques in the works — some already showing impressive Read article >
( 6
min )
As generative AI and large language models (LLMs) continue to drive innovations, compute requirements for training and inference have grown at an astonishing pace. To meet that need, Google Cloud today announced the general availability of its new A3 instances, powered by NVIDIA H100 Tensor Core GPUs. These GPUs bring unprecedented performance to all kinds Read article >
( 6
min )
Janice K. Lee, a.k.a Janice.Journal — the subject of this week’s In the NVIDIA Studio installment — is a TikTok sensation using AI to accelerate her creative process, find inspiration and automate repetitive tasks.
( 8
min )
In this post, we describe how to create an MLOps workflow for batch inference that automates job scheduling, model monitoring, retraining, and registration, as well as error handling and notification by using Amazon SageMaker, Amazon EventBridge, AWS Lambda, Amazon Simple Notification Service (Amazon SNS), HashiCorp Terraform, and GitLab CI/CD. The presented MLOps workflow provides a reusable template for managing the ML lifecycle through automation, monitoring, auditability, and scalability, thereby reducing the complexities and costs of maintaining batch inference workloads in production.
( 15
min )
As part of the 2023 Data Science Conference (DSCO 23), AWS partnered with the Data Institute at the University of San Francisco (USF) to conduct a datathon. Participants, both high school and undergraduate students, competed on a data science project that focused on air quality and sustainability. The Data Institute at the USF aims to support cross-disciplinary research and education in the field of data science. The Data Institute and the Data Science Conference provide a distinctive fusion of cutting-edge academic research and the entrepreneurial culture of the technology industry in the San Francisco Bay Area.
( 5
min )
Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google
The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…
( 93
min )
Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google
The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…
( 93
min )
Companies are discovering how accelerated computing can boost their bottom lines while making a positive impact on the planet. The NVIDIA RAPIDS Accelerator for Apache Spark, software that speeds data analytics, not only raises performance and lowers costs, it increases energy efficiency, too. That means it can help companies meet goals for net-zero emissions of Read article >
( 6
min )
AI Weirdness: the strange side of machine learning
( 2
min )
In the ever-evolving battle against the digital dark forces, the defenders of the virtual realm find themselves facing a barrage of ever-advancing threats. From the labyrinthine corridors of the Deep Web to the stealthy maneuvers of nation-state actors, the cyber landscape is as treacherous as it is vast. As our dependency on digital infrastructure deepens,… Read More »Empowering cyber guardians: How AI is changing the landscape of protection
The post Empowering cyber guardians: How AI is changing the landscape of protection appeared first on Data Science Central.
( 21
min )
We present an exact Bayesian inference method for discrete statistical
models, which can find exact solutions to many discrete inference problems,
even with infinite support and continuous priors. To express such models, we
introduce a probabilistic programming language that supports discrete and
continuous sampling, discrete observations, affine functions, (stochastic)
branching, and conditioning on events. Our key tool is probability generating
functions: they provide a compact closed-form representation of distributions
that are definable by programs, thus enabling the exact computation of
posterior probabilities, expectation, variance, and higher moments. Our
inference method is provably correct, fully automated and uses automatic
differentiation (specifically, Taylor polynomials), but does not require
computer algebra. Our experiments show that its performance on a range of
real-world examples is competitive with approximate Monte Carlo methods, while
avoiding approximation errors.
( 2
min )
Mode connectivity is a phenomenon where trained models are connected by a
path of low loss. We reframe this in the context of Information Geometry, where
neural networks are studied as spaces of parameterized distributions with
curved geometry. We hypothesize that shortest paths in these spaces, known as
geodesics, correspond to mode-connecting paths in the loss landscape. We
propose an algorithm to approximate geodesics and demonstrate that they achieve
mode connectivity.
( 2
min )
I study a stochastic multi-arm bandit problem where rewards are subject to
adversarial corruption. I propose a novel attack strategy that manipulates a
learner employing the UCB algorithm into pulling some non-optimal target arm $T
- o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log
T})$, where $T$ is the number of rounds. I also prove the first lower bound on
the cumulative attack cost. The lower bound matches the upper bound up to
$O(\log \log T)$ factors, showing the proposed attack strategy to be near
optimal.
( 2
min )
We propose a novel master-slave architecture to solve the top-$K$
combinatorial multi-armed bandits problem with non-linear bandit feedback and
diversity constraints, which, to the best of our knowledge, is the first
combinatorial bandits setting considering diversity constraints under bandit
feedback. Specifically, to efficiently explore the combinatorial and
constrained action space, we introduce six slave models with distinguished
merits to generate diversified samples well balancing rewards and constraints
as well as efficiency. Moreover, we propose teacher learning based optimization
and the policy co-training technique to boost the performance of the multiple
slave models. The master model then collects the elite samples provided by the
slave models and selects the best sample estimated by a neural contextual
UCB-based network to make a decision with a trade-off between exploration and
exploitation. Thanks to the elaborate design of slave models, the co-training
mechanism among slave models, and the novel interactions between the master and
slave models, our approach significantly surpasses existing state-of-the-art
algorithms in both synthetic and real datasets for recommendation tasks. The
code is available at:
\url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.
( 2
min )
This paper presents a set of industrial-grade text processing models for
Hungarian that achieve near state-of-the-art performance while balancing
resource efficiency and accuracy. Models have been implemented in the spaCy
framework, extending the HuSpaCy toolkit with several improvements to its
architecture. Compared to existing NLP tools for Hungarian, all of our
pipelines feature all basic text processing steps including tokenization,
sentence-boundary detection, part-of-speech tagging, morphological feature
tagging, lemmatization, dependency parsing and named entity recognition with
high accuracy and throughput. We thoroughly evaluated the proposed
enhancements, compared the pipelines with state-of-the-art tools and
demonstrated the competitive performance of the new models in all text
preprocessing steps. All experiments are reproducible and the pipelines are
freely available under a permissive license.
( 2
min )
A prevalent practice in recommender systems consists of averaging item
embeddings to represent users or higher-level concepts in the same embedding
space. This paper investigates the relevance of such a practice. For this
purpose, we propose an expected precision score, designed to measure the
consistency of an average embedding relative to the items used for its
construction. We subsequently analyze the mathematical expression of this score
in a theoretical setting with specific assumptions, as well as its empirical
behavior on real-world data from music streaming services. Our results
emphasize that real-world averages are less consistent for recommendation,
which paves the way for future research to better align real-world embeddings
with assumptions from our theoretical setting.
( 2
min )
In an age where data has become the lifeblood of businesses, deciphering this raw data to yield actionable insights is critical. Here is where the role of business analytics comes into play. Business analytics, a blend of data management, business intelligence, and predictive modeling, is a field dedicated to driving business strategies through the lens… Read More »Data visualization: The underrated skill in business analytics
The post Data visualization: The underrated skill in business analytics appeared first on Data Science Central.
( 22
min )
Today, we’re pleased to announce the preview of Amazon SageMaker Profiler, a capability of Amazon SageMaker that provides a detailed view into the AWS compute resources provisioned during training deep learning models on SageMaker. With SageMaker Profiler, you can track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. In this post, we walk you through the capabilities of SageMaker Profiler.
( 9
min )
A one-week summer program aims to foster a deeper understanding of machine-learning approaches in health among curious young minds.
( 10
min )
The MIT and Accenture Convergence Initiative for Industry and Technology selects three new research projects to support.
( 9
min )
With a new technique, a robot can reason efficiently about moving objects using more than just its fingertips.
( 10
min )
As part of NVIDIA and Microsoft’s collaboration to bring more choice to gamers, new Microsoft Store integration has been added to GeForce NOW that lets gamers stream select titles from the Xbox PC Game Pass catalog on GeForce NOW, starting today. With the Microsoft Store integration, members will see a brand-new Xbox button on supported Read article >
( 8
min )
Mens, Manus and Machina (M3S) will design technology, training programs, and institutions for successful human-machine collaboration.
( 9
min )
Persistent Systems, a global digital engineering provider, has run several pilots and formal studies with Amazon CodeWhisperer that point to shifts in software engineering, generative AI-led modernization, responsible innovation, and more. This post highlights four themes emerging from Persistent’s Amazon CodeWhisperer experiments that could change software engineering as we know it.
( 8
min )
In this post, we walk you through importing data from, and exporting data to, an S3 access point in SageMaker Data Wrangler.
( 6
min )
In this post, we discuss how to implement federated learning on Amazon SageMaker to run ML with decentralized training data.
( 13
min )
MIT system demonstrates greater than 100-fold improvement in energy efficiency and a 25-fold improvement in compute density compared with current systems.
( 9
min )
On the eve of Gamescom, NVIDIA announced NVIDIA DLSS 3.5 featuring Ray Reconstruction — a new neural rendering AI model that creates more beautiful and realistic ray-traced visuals than traditional rendering methods — for real-time 3D creative apps and games.
( 8
min )
The latest advancements in AI for gaming are in the spotlight today at Gamescom, the world’s largest gaming conference, as NVIDIA introduced a host of technologies, starting with DLSS 3.5, the next step forward of its breakthrough AI neural rendering technology. DLSS 3.5, NVIDIA’s latest innovation in AI-powered graphics is an image quality upgrade incorporated Read article >
( 6
min )
Data science was a vaguely defined discipline to begin with, but it’s shaped up substantially lately. Execs now yearn to take immediate advantage of generative and other clearly useful (if currently problematic) kinds of AI. That demand suggests an opportunity for influencers and visionaries in organizations to lobby for each organization to build an AI-ready… Read More »Beyond data science: A knowledge foundation for the AI-ready enterprise
The post Beyond data science: A knowledge foundation for the AI-ready enterprise appeared first on Data Science Central.
( 21
min )
We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.
( 12
min )
Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects. Dally will detail advances in GPU silicon, systems and software that are delivering Read article >
( 5
min )
My journey continues as I integrate a GenAI tool (Bing AI) with my Thinking Like a Data Scientist (TLADS) methodology. In part 1 of this series, I used Bing AI to validate, augment, and enhance the first three steps in the TLADS methodology (Figure 1): And the results yielded a much deeper understanding of the… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II
The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II appeared first on Data Science Central.
( 23
min )
No content preview
( 2
min )
A new $5+ million partnership aims to explore ways the development of artificial intelligence (AI) can support a thriving, innovative local news field, and ensure local news organizations shape the future of this emerging technology.
( 3
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )